Method for design validation using retiming

ABSTRACT

A method for derivation and abstraction of test models for validation of industrial designs using guided simulation is described. The method employs automatic abstractions for the test model which reduce its complexity while preserving the class of errors that can be detected by a transition tour. A method for design validation comprising generating a state-based test model of the design, abstracting said test model by retiming and latch removal; and applying validation technique on the abstracted test model. First, the number of internal (non-peripheral) latches in a design is minimized via retiming using a method of Maximal Peripheral Retiming (MPR). According to the MPR method, internal latches are retimed to the periphery of the circuit. Subsequently, all latches that can be retimed to the periphery are automatically abstracted in the test model. The validation technique may comprise of model checking, invariant checking or guided simulation using test sequences generated from the abstracted test model.

I. DESCRIPTION OF THE INVENTION

IA. Field of the Invention

This invention relates to a method for abstraction of test models forvalidation of industrial designs using guided simulation, model checkingand invariant checking. Specifically, this invention relates to a methodfor automatic abstractions for the test model which reduce itscomplexity while preserving the class of errors that can be detected bya reachability test. The invention comprises a method of MaximalPeripheral Retiming (MPR), where the number of internal (non-peripheral)latches is minimized via retiming and some latches at the periphery areremoved. The invention is embodied in a method for generating testmodels that has been shown to be practical by providing a detailed casestudy, as well as experimental results of applying this abstraction on aset of benchmark circuits.

IB. Background of the Invention

Large industrial designs are commonly validated using test sequences. Ina typical methodology, a test model is derived from the design, and testsequences are generated from it by using formal verification techniques.These test sequences, which satisfy certain coverage criteria, are thenused for functional simulation. A popular conventional method comprisesgenerating test sequences by performing a transition tour on the statespace of the test model, thereby guaranteeing complete transitioncoverage for it. In practice, such a method has been shown to beeffective in uncovering errors that are otherwise difficult to find. SeeR. C. Ho, C. H. Yang, M. A. Horowitz, and D. L. Dill, “Architecturevalidation for processors”, Proceedings of the 22^(nd) AnnualInternational Symposium on Computer Architecture, June 1995.

A primary conventional methodology for design validation is simulationof functional models of design. There are at least two major problemswith using such an approach. The first is for any reasonable coverage ofpossible behaviors of the design, a substantial amount of computationalresources are required. The second is that such an approach lacks anyformal measure of this coverage.

However, exhaustive simulation is beyond practical limits. Therefore,enhancing the quality of the simulation test set becomes important. Inresponse to this, formal verification is emerging as a no-test-vectorsvalidation methodology. In such a formal verification method, a formalproof of correctness covers the entire space of tests. But, while recentadvances in formal verification are promising, they tend to have limitedapplicability.

A key barrier to the complete adoption of formal verification techniquesis the very large state space associated with most designs. The use ofimplicit state traversal techniques has significantly extended theselimits. For more details, see J. R. Burch, E. M. Clarke, D. E. Long, K.L. McMillan, and D. L. Dill, Symbolic model checking for sequentialcircuit verification, IEEE Transactions on Computer-Aided Design,13(4):401-424, April 1994; O. Coudert, C. Berthet, and J. C. Madre,Verification of sequential machines using Boolean functional vectors, L.M. J. Claesen, editor, Proceedings of tile IFIP International Workshopon Applied Formal Methods for Correct VLSI Design, Belgium, 1989, volume11, pages 179-196, North-Holland, Amsterdam, 1989; and H. J. Touati, H.Savoj, B. Lin, R. K. Brayton, and A. Sangiovanni-Vincentelli, Implicitstate enumeration of finite state machines using BDDs, Proceedings ofthe IEEE International Conference on Computer-Aided Design, pages130-133, IEEE Computer Society Press, Los Alamitos, Calif., 1990.

Nonetheless, there continues to be a significant gap between thecapabilities of conventional formal verification techniques and thepractical requirements imposed by them.

Recently there has been a great interest in hybrid techniques. Thesehybrid techniques combine the relative strengths of formal verificationon one hand and simulation on the other. A popular hybrid methodology isbased on using formal techniques to ensure some form of simulationcoverage. It comprises first deriving a formal model of the design,typically a finite state model (FSM), hereafter called the test model.Then techniques similar to those used in formal verification (e.g.symbolic state space traversal) are used for generation of a set of testsequences. Finally, this test set is used as stimuli for functionalsimulation of the entire design, which can be used for eitherspecification validation, or for comparison of an implementation againsta given specification. A schematic for the such an application is shownin FIG. 1.

The test set is selected such that it has certain coverage propertieswith respect to the test model. These models could include—for example,coverage of each state, or coverage of each transition, or coverage ofeach pair-arc. For more details, see Y. Hoskote, D. Moundanos, and J. A.Abraham, Automatic extraction of the control flow machine andapplication to evaluating coverage of verification vectors, Proceedingsof the IEEE International Confeence on Computer Design, pages 532-537,October 1995; H. Iwashita, S. Kowatari, T. Nakata, and F. Hirose,Automatic test program generation for pipelined processors, Proceedingsof the IEEE International Conference on Computer Design, pages 580-583,October 1994; D. Geist, M. Farkas, A. Landver, Y. Lichtenstein, S. Ur,and Y. Wolsfthal, Coverage-directed test generation using symbolictechniques, Proceedings of the Conference on Formal Methods in CAD,pages 143-158, November 1996; R. C. Ho, C. H. Yang, M. A. Horowitz, andD. L. Dill, Architecture validation for processors, Proceedings of the22nd Annual International Symposium on Computer Architecture, June 1995;and Y. Hoskote, D. Moundanos, and J. A. Abraham, Automatic extraction ofthe control flow machine and application to evaluating coverage ofverification vectors, Proceedings of the IEEE International Conferenceon Computer Design, pages 532-537, October 1995.

Several conventional variations on aspects of the hybrid theme have alsobeen proposed. In one such variation, the coverage criteria used in thehybrid theme is used to evaluate the quality of a given test set or todrive the search for additional test sequences that fill in the gaps.Instead of building a separate test model, such measures are also useddirectly during simulation to provide partial coverage of the statespace for checking invariants. For details, see J. Yuan, J. Shen, J.Abraham, and A. Aziz, On combining formal and informal verification,Proceedings of the International Conference on Computer-AidedVerification, volume 1254 of Lecture Notes in Computer Science, pages376-387, Springer-Verlag, New York, June 1997.

Other related techniques use property-specific models, instructiontemplates, and HDL descriptions for generation of the test sets. See D.Lewin, D. Lorenz, and S. Ur, A methodology for processor implementationverification, Proceedings of the Conf. on Formal Methods in CAD, pages126-142, November 1996; A. K. Chandra, V. S. Iyengar, D. Jameson, R.Jawalekar, I. Nair, B. Rosen, M. Mullen, J. Yoor, R. Armoni, D. Geist,and Y. Wolfsthal, Avpgen—a test case generator for architectureverification, IEEE Transactions on VLSI Systems, 6(6), June 1995; and K.Cheng and A. Krishnakumar, Automatic functional test generation usingthe extended finite state machine model, Proceedings of the 30thACM/IEEE Design Automation Conference, pages 86-91, June 1993.

While most prior art methodologies based on simulation coverage havebeen shown useful for detection of design errors, there is no teachingon how coverage measures on the test model translate to coverage ofdesign errors.

There has been some recent work on identifying requirements under whicha transition tour (a test set that covers each transition) on a testmodel can be used for covering all design errors with respect to a givenspecification. See A. Gupta, S. Malik, and P. Ashar, Toward formalizinga validation methodology using simulation coverage, In Proceedings ofthe 34th ACM/IEEE Design Automation Conference, pages 740-745, June1997; and See A. T. Dahbura, K. K. Sabnani, and M. U. Uyar, Formalmethods for generating protocol conformance test sequences, Proceedingsof the IEEE, 78(8): 1317-1326, August 1990.

Furthermore, the derivation of the test model remains largely ad-hoc.Importantly, very few formal guidelines have been provided for such aderivation. For example, for processor validation, the datapath in thedesign is typically abstracted out and only the controller portion isretained in the test model. Beyond the datapath abstraction, mostefforts in this direction have gone only so far as to designate certainstate bits as more “interesting”, thereby ensuring the associatedpartial coverage. For details, see D. Geist, M. Farkas, A. Landver, Y.Lichtenstein, S. Ur, and Y. Wolsfthal, Coverage-directed test generationusing symbolic techniques, In Proceedings of the Conf. on Formal Methodsin CAD, pages 143-158, November 1996 and J. Yuan, 1. Shen, I. Abraham,and A. Aziz, On combining formal and informal verification, InProceedings of the Int. Conf. on Computer-Aided Verification, volume1254 of Lecture Notes in Computer Science, pages 376-387.Springer-Verlag, New York, June 1997.

There has been some work on state space abstraction based on equivalenceof output control signals as seen by the datapath. For details, see R.C. Ho and M. A. Horowitz, Validation coverage analysis for complexdigital designs, In Proceedings of the IEEE international Conference onComputer-Aided Design, pages 146-151, November 1996. The same work alsodescribes the use of don't-cares obtained by automatic HDL descriptionanalysis, and the use of over-approximation for handling state spaceexplosion. However, the underlying goal was still improvement in thestate/transition coverage metric, without relating it to design errorcoverage in any way.

It is important to note that the success of abstraction in the area offormal verification is predicated on a clear formulation of thecorrectness criteria. For details, see R. P. Kurshan., Formalverification in a commercial setting, In Proceedings of the ACM/IEEEDesign Automation Conference, pages 258-262, June 1997; D. E. Long.,Model Checking, Abstraction and Modular Verification. PhD thesis, Schoolof Computer Science, Carnegie Mellon University, Pittsburgh, Pa., July1993; and P. Wolper., Expressing interesting properties of programs inpropositional temporal logic, In Proceedings of the Thirteenth AnnualACM Symposium on Principles of Programming Languages, pages 184-192,ACM, New York, January 1986. Once such criteria exist, it is possible toreason about their preservation by use of appropriate abstractions.Since most efforts using guided simulation have concentrated only onstate/transition coverage, without relating these to error coverage ofthe original design, there is hardly any notion of preservingcorrectness. This, in turn, has made it hard to use abstractioneffectively, except to provide partial coverage of thestates/transitions.

Despite the above advances, prior art has failed to provide a method forderiving and abstracting of test models suitable for validation of largeindustrial designs using guided simulation.

II. SUMMARY OF THE INVENTION

The present invention provides a notion of correctess of theabstraction. Correctness of the abstraction implies completeness of thetransition tour error coverage of the abstract test model with respectto the original design. Such an abstraction is correct if it preservescoverage of those errors that can be captured by a transition tour. Inother words, the test sequences generated from a transition tour on theabstract test model should cover the complete set of those design errorsthat are covered by test sequences generated from any transition tour onthe original design.

A transition tour methodology is used in the present invention since itis the most prevalent mode of generating test sequences. This notion canbe potentially extended to other modes and applications as well.

The invention also includes applications to model checking and invariantchecking. An error is detected in model checking or invariant checkingby demonstrating the reachability of a particular state which causesincorrect behavior. These states may cause incorrect behavior by virtueof their presence in undesirable loops or because specific logicexpressions to have incorrect values. A property of the retiming basedabstraction technique is that reachability of states is preserved.Consequently, an error that was detected as a result of performing modelchecking on the original model will also be detected by model checkingon the abstract model.

The applicability of the invariant checking follows from the samereasoning as for model checking. An invariant is a property (a logicexpression or a model complex relationship between variables that shouldhold at all times). Checking an invariant also involves ensuring thatstates that can cause the property to be violated are never reached. Themethod of the present invention also works for reachability in general.

In particular, the use of Maximal Peripheral Retiming (MPR) as anabstraction which satisfies this correctness property under certainconditions is provided. In general terms, an MPR is a retiming where isas many state elements as possible are moved to the periphery of thecircuit. Consequently, there are as few internal state elements aspossible. In the present specification the term “latches” to refer toall forms of state elements (latches/registers/flipflops).

Once MPR has been done, latches that are at the periphery of the circuitare candidates for removal in order to obtain the abstract test model,thereby potentially reducing its complexity. Intuitively, a subcircuitconsisting only of peripheral latches can be safely removed if itcontains no errors, because it does not affect the detection of errorsin rest of the circuit. Under this condition, it is shown that theabstraction is correct, i.e. any design error that can be uncoveredusing the original design can also be uncovered using the simplerabstract test model. The practical importance of this abstraction is inthe significant reduction in the number of latches, and thereby thecomplexity of validation.

To overcome the problems in conventional methodologies it is anobjective of this invention to provide a method for deriving andabstracting of test models suitable for validation of industrialdesigns.

Specifically it is an object of this invention to provide a method forautomatic abstraction of the test model that reduces its complexitywhile preserving the class of errors that can be detected by areachability analysis.

To achieve the objects of the present invention there is provided amethod for design validation comprising: generating a state-based testmodel of the design; retiming said test model to produce a retimed testmodel; abstracting said retimed test model to produce an abstracted testmodel; and applying validation technique on the abstracted test model.

Preferably the validation technique used is model checking.

Preferably the validation technique used is invariant checking.

Preferably the validation technique used is guided simulation using testsequences generated from the abstracted test model.

Preferably the test model is retimed by Maximal Peripheral Retiming,which minimizes the number of internal latches.

Preferably the retimed test model is abstracted by removing of correctlypositioned peripheral latches.

Still preferably, the test sequence generation is done based ontransition tour coverage for guided simulation.

Still preferably the method further comprises using a conventionalretiming procedure for retiming of said test model wherein in saidconventional retiming procedure the bus width of each input and outputis taken to be zero.

Still preferably, the method further comprises using a conventionalretiming procedure for retiming of said test model wherein:

-   -   a dummy node D₁ is added to the circuit with fanout to each of        the primary inputs I_(i); and    -   a dummy node D₀ is added with no fanin and a single fanout to D₁    -   wherein buswidths corresponding to edges are as follows:        -   for edge (D₀, D₁) buswidth is 0,        -   for edge (D₁, I₁) buswidth is infinite,        -   for outer periphery edges buswidth is 0, and        -   for all other edges, buswidth is same as in original            circuit.

Still preferably all correctly positioned output peripheral latches, andan equal number of correctly positioned input peripheral latches on allinputs are removed.

III. BRIEF DESCRIPTION OF THE DRAWINGS

The above objectives and advantages of the present invention will becomemore apparent by describing in detail preferred embodiments thereof withreference to the attached drawings in which:

FIG. 1 shows a diagram illustrating a validation methodology usingguided simulation.

FIG. 2 shows examples of legal and illegal retimings.

FIG. 3 shows examples of peripheral latches.

FIG. 4 shows an example illustrating the limitations of transitiontours.

FIG. 5 shows diagrams illustrating correctness of A_(eq) abstraction.

FIG. 6 shows diagrams illustrating correctness of A_(neq) abstraction.

FIG. 7 shows an example circuit with multiple clocks/phases.

FIG. 8 shows a diagram illustrating initial test model for the DLX.

FIG. 9 shows latches in the DLX fetch stage.

FIG. 10 shows outputs from the DLX memory stage.

FIG. 11 shows a diagram illustrating peripherally retimable latches inthe DLX.

FIG. 12 shows examples of unretimable latches in the DLX.

FIG. 13 shows Table 1 showing reduction in the number of latches usingthe techniques of the present invention.

IV. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention provides methods for derivation of the test modeland the use of design abstractions to reduce its complexity.

The application of MPR according to the embodiment of the presentinvention will be illustrated below on a case study of the controller ofa pipelined implementation of the DLX processor. Empirical evidence ofits practical efficacy will be demonstrated using a set of benchmarkcircuits.

First, it will be shown how retiming is used to abstract state from thetest model of a circuit according an embodiment of the presentinvention. This is followed by the formulation of the general MPRproblem and its solution, as defined by an embodiment of the presentinvention. Next, a detailed case study of a DLX processor, followed byexperimental results of applying MPR to a set of benchmark circuits willbe presented.

IVA. Retiming for Abstraction

Retiming is the process of repositioning latches across thecombinational logic in a sequential circuit so as to minimize the clockperiod, the number of latches, or to meet a given clock period whileminimizing the number of latches. Polynomial time algorithms for thesewere introduced by Leiserson and Saxe. For details on these algorithms,see Charles E. Leiserson and James B. Saxe, Retiming SynchronousCircuitry, Algorithmica, 6(1):5-36, 1991. By definition, retimingpreserves the I/O behavior of the original sequential circuit. FIG. 2gives examples of legal and illegal retimings.

IVA1. Definitions

For the purpose of describing the preferred embodiments of the presentinvention, the following definitions will be used:

Definition 1

An input peripheral latch p_(i) is a latch whose fanin q is either aprimary input or the output of another input peripheral latch, such thatq fans out only to p_(i).

Intuitively, input peripheral latches are located at the input peripheryof a circuit, such that no fanin signal to these latches is directlyused by rest of the logic. In other words, the application of primaryinputs is merely delayed by these latches.

Definition 2

An output peripheral latch p_(o) is a latch which fans out only toeither primary outputs or other output peripheral latches.

Output peripheral latches are located at the output periphery of acircuit, such that no fanout signal from these latches is directly usedby rest of the logic. Again, these latches merely delay the availabilityof primary outputs.

Definition 3

An internal latch l_(int) is a latch which is neither an inputperipheral latch, nor an output peripheral latch.

Clearly, these latches determine the reachability of the state spaceassociated with a circuit. As an example, in the circuit shown in FIG.3, latches L1, L2, L3 are input peripheral latches, latches L5, L6, L8are output peripheral latches, and L4, L7 are internal latches.

IVA2. Design Errors and Transition Tours

A design implementation is considered to have an error with respect toits specification if the implementation produces an incorrect output forsome input sequence. In general, a transition tour cannot capture allerrors, since it covers all single transitions only, and not alltransition sequences.

For example, consider the state transition graph shown in FIG. 4.Suppose there is an error in the transition from State 2 to 3 on inputa, where it is incorrectly implemented as a transition from State 2 to3′. Let the transitions on input b, from State 3 to 4 and from State 3′to 4′, result in different outputs during simulation. Also, let thetransitions on input c, from Slate 3 to 5 and from State 3′ to 5 resultin the same outputs during simulation. Therefore, if a transition touruses the sequence <a, c>, the error on transition with input a can getdetected on the next transition itself. On the other hand, if atransition tour uses the sequence <a, b>, the error will not getdetected on the next transition, and may never get detected at all. Thisillustrates the basic limitation of using transition tours—an error mayget detected several transitions after it is excited, and only along aspecific path in the state transition graph. If this path is notselected in the transition tour, the error will not be covered.Furthermore, depending on what particular paths are selected, differenttransition tours may cover different sets of errors.

IVA3. Correctness of Abstraction

It should be noted that the present invention is not restricted to aparticular type of transition tours. In order to not tie the analysis toa particular choice of a transition tour, for the rest of thisspecification, the focus is on those errors that can be covered by anytransition tour of a given design implementation.

Based on the above observations, it is clear that such an error can bedetected as a difference in the observed output on a transition from astate, regardless of how that state was reached. Thus, it can be shownthat reachability of states is preserved by the abstraction used in thepresent invention. The analysis is also not restricted to systems withspecial properties that allow all errors to be covered by a transitiontour. See A. Gupta, S. Malik, and P. Ashar. Toward formalizing avalidation methodology using simulation coverage. In Proceedings of the34th ACM/IEEE Design Automation Conference, pages 740-745, June 1997.

The following definitions are uses as criteria for correctness of anabstraction in this context:

Definition 4

Transition tour error coverage completeness: A model T has transitiontour error coverage completeness with respect to another model D, if allerrors in D which can be covered by any transition tour on D, are alsoerrors in T and can be covered by some transition tour on T

Definition 5

An abstraction A is correct if the abstract test model T=A(D) hastransition tour error capture completeness with respect to the originalimplementation D.

IVA4. Retiming and Removal of Peripheral Latches

To achieve its goals, the present invention focuses on a special classof retiming called Maximal Peripheral Retiming (MPR). The formalstatement of the problem, and an outline for its solution are describedbelow. Once MPR has been performed on the given design, the followingtwo kinds of abstraction are considered for obtaining the test model:

-   -   A_(eq): Removal of all output peripheral latches, and removal of        an equal number of input peripheral latches across all inputs.    -   A_(neq): Removal of all output peripheral latches, and removal        of all (potentially non-equal) number of input peripheral        latches across all inputs.

It is shown herein that these abstractions are correct according to thecriteria described in the previous section, i.e. the abstractionspreserve detection of errors covered by any transition tour in theoriginal design implementation. This is true under certain conditions.The intuitive explanation is described here, with proofs to follow inthe next section.

Note that the I/O-preserving nature of a retiming transformation itselfguarantees that an error observed on a transition from a reachable states in the original circuit will be preserved as a corresponding errorfrom an equivalent state s′ in the retimed circuit. The main insighthere is that the removal of a peripheral latch (either input or output)does not affect the presence of such an error, provided the latch wascorrectly positioned to start with. This is because, in a correctdesign, the position of a peripheral latch implies that it can notaffect the value of an output (only its timing), which can be determinedstrictly from rest of the circuit. Thus, all errors in the subcircuitwith internal latches are preserved by the abstract circuit Furthermore,a transition tour on the abstract circuit will eventually reach (giventhe initial conditions) and cover the erroneous transition. Moreformally, it can be said that a peripheral latch is correctly positionedif it has been separately validated that this position of the latch atthe circuit periphery is consistent with intended behavior.

In some sense, the burden of checking design correctness is decomposedinto:

-   -   (i) detecting errors in the peripheral latches, and    -   (ii) detecting errors in rest of the circuit.

It can be shown that the former task can be handled separately, and insome cases more efficiently, than the latter.

Consider the case where an RTL specification is available. In this case,first perform an MPR on the specification. As described later, one canarrange it so that such a retiming results in a unique configuration ofthe peripheral latches. Thus, the peripheral latches of the retimedimplementation after MPR contain no errors only if their configurationmatches exactly the configuration of the peripheral latches in theretimed specification after MPR. In fact, if the match fails, one caneasily obtain a counter-example which exhibits the difference betweenthe two.

For cases where an RTL specification is not available, one could stilluse the knowledge of the designer/verification expert in certifying thatthe positioning of the peripheral latches is correct. In practice,especially in pipelined design implementations, many peripheral latchesare already at the periphery in the original design itself. i.e. withouthaving performed MPR. For example, as described in our case study of theDLX controller (Section IVC), 63 of the final 72 peripheral latches werealready at the periphery before MPR.

In most cases, the correctness of these latches is easy to justifyintuitively. However, the situation is somewhat trickier for latchesthat are moved to the periphery from some internal position afterperforming MPR. In these cases, one has to depend on input from eitherthe designer, or some higher level analysis. For example, if a behaviorlevel specification is available, one can perform an analysis for eachoutput to capture its dependence on the relative time instances of eachinput (rather than its functional dependence) in order to infer somestructural latch relationship in the corresponding RTL description.

Basically, it is first ensured that the peripheral latches arepositioned correctly. This can be done either automatically for RTLspecifications by using MPR itself, or manually through input from thedesigner. Then, the remaining circuit is handled by using the sametransition tour techniques as before, but with potentially reducedcomplexity due to a smaller model.

IVA5. Proof of Correctness

The following Theorems 1-3 provide a basis for the present invention andits embodiments:

Theorem 1

Removal of correctly positioned output peripheral latches preservesreachabilty error and thereby detection.

Proof:

The only purpose served by the output peripheral latches is to bufferthe primary outputs, which affects only the timing when the outputs areready to be observed during simulation. In particular, for an output Ofrom which n_(po) output peripheral latches have been removed, theresult should be observed with a delay of n_(po), cycles duringsimulation on the original. It is clear that removal of these latcheshas no impact on either the state space visited during the transitiontour, or on its input sequences, thereby preserving reachabilty anddetection of errors.

Theorem 2

Removal of an equal number of correctly positioned input peripherallatches across all inputs preserves error detection.

Proof:

For the abstract model, let n_(pi) be the number of input peripherallatches removed from all inputs. Given m inputs, let I_(i,j), 1≦i≦m,1≦j≦n_(pi) denote the initial value on the j^(th) input peripheral latchfor the i^(th) input, as shown in FIG. 4. Note that the circuit blocksmarked C and C′ in the two models are identical. Let state s bereachable by an input sequence Σ in the original model. Then, thereexists an equivalent state s′ reachable in the abstract model by theinput sequence Σ′=σ₁, σ₂, . . . σ_(npi), Σ, where σ_(j) is the inputvector I_(1,j), I_(2,j) . . . I_(m,j). Since s and s′ are equivalent, ifthere is an error from state s on input a, there will be an error fromstate s′ on input a. Furthermore, since all reachable states and alltransitions from those states are covered by a transition tour, thiserror will be detected by any transition tour on the abstract model withσ₁, σ₂, . . . , σ_(npi) as a prefix. Finally, during simulation on theoriginal design, Σ′ should be padded by n_(pi) dummy inputs at the end,in order to account for the delay due to the original input peripherallatches with respect to the observed outputs.

Theorem 3 Removal of all correctly positioned input peripheral latchespreserves error defection for the circuit.

Proof:

For the abstract model, let n_(i) be the number of input peripherallatches removed from the i^(th) input. Given m inputs, let I_(i,j),1≦i≦m, 1≦j≦n_(i) denote the initial value on the j^(th) input peripherallatch for the i^(th) input, as shown in FIG. 5. Again, note that thecircuit blocks marked C and C′ in the two models are identical.

Now consider a state s reachable by an input sequence Σ=σ₁, σ₂, . . . ,σ_(r) in the original model, where each σ_(k)=σ_(1,k), σ_(2,k) . . .σ_(m,k) denotes the vector of inputs 1 through m. Due to the equivalenceof blocks C and C′, there exists an equivalent state s′ reachable in theabstract model by the input sequence Σ′=σ′₁, σ′₂, . . . , σ_(r+max(ni)),where σ′_(k)=σ′_(i,k)σ′_(2,k) . . . σ′_(m,k) is the input vectorconstructed in such a way that:σ′_(i,k) =I _(j,k) if k≦n_(i) and=σ_(i,k-nl) if n_(i)<k<=r+n_(i)=−(don't care) if k>r+n_(i)Since s and s′ are equivalent, if there is an error from state s oninput a, there will be an error from state s′ on input a. Furthermore,since all reachable states and all transitions from those states arecovered by a transition tour, this error will be detected by atransition tour (with the appropriate prefix) on the abstract model.Again, during simulation on the original design, Σ′ should be padded bymax(n_(i)) dummy inputs at the end, in order to account for the delaydue to the original input peripheral latches with respect to theobserved outputs.

From Theorems 1, 2 and 3, it is straightforward that both theabstractions A_(eq), and A_(neq), when limited to removal of correctlypositioned latches, preserve reachability and thereby error detection bya transition tour. Note that in practice Σ is not known apriori, sinceperforming a transition tour on the original model should be avoided.Instead, during generation of the transition tour Σ′ on the abstracttest model, the initial prefix captures the constraints imposed by theinitial values of those input peripheral latches that are removed by theabstraction. This ensures that only the reachable part of the originalstate space is explored by the transition tour on the abstract statespace.

Note that though the analysis has been presented in terms of abstractionfrom the original design implementation to a test model, it also holdsfor abstraction from any concrete test model to an abstract one. Thiscan be quite useful in the context of a guided simulation methodology,where the concrete test model may itself be the result of abstractingthe design implementation in some form. For example, for processorverification, a popular abstraction is to remove the datapath elementsfrom the RTL design, and focus on the resulting controller. Naturally,in this case, the usefulness of our abstraction will depend on theextent to which design errors are contained in the concrete test modelto start with.

The reason for differentiating between the two kinds of abstractionsstems from the following methodological consideration. Consider the casewhere the test sequences generated from a transition tour of theabstract test model are used for simulation comparison between the RTLand behavior level descriptions, as shown in FIG. 1. Note that in theuse of the, A_(neq), abstraction, the input vectors generated by atransition tour on the abstract test model do not correspond directly toinput vectors applied to the original design. In fact, values fromdifferent input vectors in the abstract test sequence have to be puttogether to obtain an input vector for the original design sequence(inverse of Σ′ in the proof of theorem 3). Such a “jumbled-up” inputvector at the RTL may not correspond to any obvious input scenario atthe behavior level, thereby making it harder to ascertain the expectedresults. For example, in the case of processor verification, ajumbled-up input vector at RTL may consist of opcode bits from onevector, and register address bits from another, potentially leading toan invalid instruction input at the behavior level. However, if we useA_(eq), all inputs in the abstract vector sequence are delayed by thesame amount when applied to the original. Thus input vectors arepreserved across the various models, facilitating their use insimulation.

The description below will consider algorithms for MPR under bothscenarios—abstraction of both equal and unequal input peripherallatches.

IVB. Maximal Peripheral Retiming

IVB1. Peripheral Retiming

The notion of moving the latches to the boundary of the circuit isrelated to the concept of peripheral retiming. For details on peripheralretiming see S. Malik, E. Sentovich, R. K. Brayton, and A.Sangiovanni-Vincentelli, Retiming and resynthesis: Optimizing sequentialnetworks with combinational techniques, IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 10(1):74-84,January 1991. Peripheral retiming was introduced to help define themaximal circuits which could be directly optimized using combinationallogic techniques. More specifically, a peripheral retiming of a circuitis a retiming that results in all the latches migrating to the peripheryof the circuit. This results in the remaining circuit beingcombinational and thus transformable by combinational logic optimizationtechniques. Not all circuits permit peripheral retiming, the exactconditions under which peripheral retiming is permitted were defined inS. Malik, E. Sentovich, R. K. Brayton, and A. Sangiovanni-Vincentelli,Retiming and resynthesis: Optimizing sequential networks withcombinational techniques, IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, 10(1):74-84, January 1991.

IVB2. Problem Formulation

According to the goals of the present invention it is desirable to moveas many of the latches to the periphery as possible. It is likely thatit will not be possible to move all of them to the periphery. Thus, thespecific problem that needs to be solved is not exactly peripheralretiming, though it is somewhat related. Given a circuit C, a circuit C′needs to be derived by a retiming such that the number of latches thatare not at the periphery of C′ are minimized. This results in areduction in the state space (and hence transition tour) of C′. Thisproblem is referred to as Maximal Peripheral Retiming (MPR). Two flavorsof this, MPR_(eq) and MPR_(neq) will be considered, corresponding to theabstractions A_(eq) and A_(neq), respectively. In MPR_(eq), only anequal number of latches at each primary input will eventually beabstracted, so the remainder must be counted as internal latches. InMPR_(neq) there is no such restriction.

Note that unlike peripheral retiming, all circuits have a maximalperipheral retiming. In the worst case, if none of the latches move tothe periphery, then the final circuit is the same as the initialcircuit.

A similar notion is also used in reducing the complexity of automatictest pattern generation. For details on automatic test patterngeneration see R. Gupta, R. Gupta, and M. A. Breuer, BALLAST: Amethodology for partial scan design, Proceedings of the InternationalSymposium on Fault Tolerant Computing, pages 118-125, June 1989. Herethe latches that cannot be peripherally retimed are selected as partialscan latches for test purposes. Very much related to the presentinvention is the also work by Balakrishnan and Chakradhar which showsthe application of retiming with minimizing latches in sequential testgeneration, along with the transformations for the test vectors inducedby such retimings. See A. Balakrishnan and S. T. Chakradhar, Softwaretransformations for sequential test generation, Fourth Asian TestSymposium, 1995.

IVB3. Algorithms for MPR

Extensions of standard retiming algorithms to handle MPR_(eq) and isMPR_(neq) are straightforward. Consider MPR_(neq) first. In the originalretiming formulation, Leiserson and Saxe provide support for differentwidth busses, recognizing that if the output of a module is 32 bits wideand the inputs are only 16 bit wide, then the output register coststwice as much as the input register. See Charles E. Leiserson and JamesB. Saxe, Retiming Synchronous Circuitry, Algorithmica, 6(1):5-36, 1991.A straightforward application of this to MPR_(neq) is to consider thebus width of each peripheral edge (input and output) to be zero. Now,conventional retiming to minimize the number of latches in the circuitwill minimize the number of internal latches by attempting to push asmany as possible to the periphery since those latches have zero cost.

MPR_(eq) is handled with a minor modification of a circuit beforeapplying the above algorithm. A dummy node D₁ is added to the circuitwith fanout to each primary inputs I_(i). Another dummy node Do is addedwith no fanin and a single fanout to D₁. The bus width of the edges isas follows:

-   -   for the (D₀, D₁) edge it is 0    -   for the (D₁, I_(i)) edges it is infinite    -   for the output periphery edges it is 0    -   for all other edges, it is the same as in the original circuit.

This will force a minimization of internal latches in the circuit bytrying to push as many latches to the output periphery and on the (D₀,D₁) edge. Note that the latches on the (D₀, D₁) edge have been obtainedby retiming an equal number of latches from each of the input peripheraledges. Thus the pair of dummy nodes (D₀, D₁) captures the requirementthat an equal number of latches be retimed at each input peripheraledge.

There may be several peripheral retimings obtained using the aboveformulations. Essentially, an equal number of latches may be taken fromeach primary input (output) and added to each primary output (input)while preserving functionality. Since the number of latches at theprimary outputs is to be maximized, the peripheral retiming can bebiased in that direction by picking appropriate bus widths (e.g. 0 foroutput peripheral edges, 1 for input peripheral edges, and some largenumber greater than the maximum fanins of any gate for internal edges).This also assures a unique peripheral latch configuration, i.e. thenumber of latches at each peripheral edge are uniquely specified thoughthe number of latches on internal edges may vary.

The constructions described above for MPR_(eq) and MPR_(neq) reducethese problems to standard retiming for minimizing the number oflatches; which is a well studied problem with several efficientpractical implementations.

IVB4. Handling Enables and Multiple Clocks/Phases

In practical designs, different latches may be enabled under differentenable conditions and clocked by different clock signals/phases. Legl etal. consider the problem of retiming for latches with multiple clocksand enables. See C. Legl, P. Vanbekbergen, and A. Wang, Retiming ofedge-triggered circuits with multiple clocks and load enables, Notes ofthe International Workshop an Logic Synthesis, May 1997. Their solutionis to consider classes of latches, where each class contains all thelatches with exactly the same enables and clocks. Now, latches can beretimed across a gate (backward or forward) only if they all belong tothe same class. While this algorithm is correct in that it will generatea circuit functionally equivalent to the original circuit, it may bepessimistic in that it may not permit all possible retimings. Consider amulti-phase (φ₁, φ₂, . . . φ_(n)) clocking scheme. Let the clock periodstart at the leading edge of φ₁. Now consider a sub-circuit as shown inFIG. 7. (In this figure, and in others to follow, the standard practiceof using rectangles to indicate latches and circles to indicatecombinational logic blocks is used. Also, multi-bit signals whichrepresent a single logical signal are shown schematically with a singlelatch, unless explicitly stated otherwise.) As shown, the three latchesL₁, L₂, and L₃ are clocked by φ₁, with paths from L₁ to L₃ and also L₂to L₃. Let the path from L₁ to L₃ have a latch L₄ clocked by φ₂. Thepath from L₂ to L₃ has no other latch.

Under the class based algorithm, the L₄ will not be able to moveforward. This is obviously restrictive—a dummy latch L₅ can always beadded, right after L₂, that can permit the migration of L₄. This doesnot change the logical behavior of the circuit. It does change the clockperiod requirement, but that is immaterial in the current application.Thus, to maximize the ability to move latches to the periphery, pathswith unbalanced phases are balanced by adding dummy latches as explainedabove. This step was explicitly needed in the case study presentedbelow.

IVC. Case Study: The DLX Processor

A Verilog RTL implementation (without floating-point andexception-handling instructions) of the popular DLX processor is used toshow practicability of the method according to the present invention.For details on the DLX processor, see J. L. Hennessy and D. A.Patterson, Computer Architecture: A Quantitative Approach, MorganKaufmann, 1990. The DLX processor uses a standard 5-stage pipelineconsisting of the fetch, decode, execute, memory and write-back stages.The design also includes an interlock module, which implements dataforwarding and load interlock for data hazards, and uses abranch-not-taken strategy for handling control hazards. For the presenttest design, the initial test model was obtained by manual removal ofthe datapath elements, and all signals pertaining to immediate data.This model is shown in FIG. 7, where the signals from/to the datapathare modeled as primary inputs/outputs, respectively. It consisted of 157state elements, 39 primary inputs, and 40 primary outputs. Then, somegeneric abstractions were applied, such as considering a reduced set ofregisters (4 instead of 32), changing a one-hot encoded set of signalsto a binary encoding, and retiming to avoid duplication of registers inthe interlock module. These resulted in a test model with 92 latches,which is still beyond the capabilities for generating a transition tour.

IVC1. Abstraction Using MPR

Next, peripheral retiming was applied to abstract this model further,which yielded four interesting classes of latches, three of which couldbe abstracted out.

-   -   Class1: Latches at the input periphery: In typical pipelined        designs, the fetch stage simply passes on the instruction to the        decode stage, without looking at its contents whatsoever. In        circuit terms, the latches used by the fetch stage for storing        the instruction word appear at the input periphery of the model.        Furthermore, these latches share a common enable signal (-(Stall        V Stall_for_.Multiply)), and are not part of any reconvergent        paths. These can be abstracted out, saving 32 latches.    -    It is interesting to compare this abstraction to the late        bifurcation technique used by Lewin et al. See D. Lewin, D.        Lorenz, and S. Ur. A methdology for processor implementation        verification. In Proceedings of the Int. Conf on Formal Methods        in CAD, pages 126-142, November 1996 to abstract the fetch        stage. Late bifurcation is used to break up the instruction into        sections that are used appropriately by different stages of the        pipeline model. However, this gives rise to the problem of        consistency, i.e. ensuring that information from the same        instruction persists as it advances through the different        stages. They deal with this problem by requiring consistency        during generation of the architectural tests from their model.        In the approach used in the present invention, consistency is        guaranteed implicitly by the methodology. For example, test        sequences are generated from the test model in terms of inputs        to the decode stage, which are applied one time step earlier        with respect to the observed outputs during simulation.    -   Class2: Latches at the output periphery: Recall that the primary        outputs of the test model, in most cases, are control signals to        the datapath. Pipelined designs typically use synchronizing        latches on such control signals, which appear at the output        periphery of the test model. For the DLX design, there were 31        such latches which could be abstracted out, including the entire        writeback stage. Since they had a common enable (always        enabled), and did not belong to any reconvergent paths, they        were all abstracted out. In effect, this removed output latches        from different stages and completely removed the writeback stage        of the pipeline.    -    For example, the outputs from the memory stage of our DLX        design are shown in FIG. 10, where LSSC, DMR and DMW are control        signals for the data memory representing the load-store size        code, read, and write signals, respectively; and ALUConst is        used to force the delayed ALU output register to a constant in        case of an address mis-alignment. Note that the correct        synchronization of these signals with rest of the datapath will        be automatically checked during functional simulation.    -   Class3: Latches retimed to move to the periphery: The more        interesting latches are those that are not already at the        periphery, but can be moved there by retiming. For the present        DLX design, savings of 8 latches were obtained from three sets        of signals, as shown in FIG. 8:        -   Imm_Ctrl: used in the datapath to select the immediate data            as the second source for the ALU.        -   AFC: used as the function code for the ALU in the datapath.        -   LSSC: represents the load-store size code for the data            memory. During the process of maximal peripheral retiming,            it was found that 2 of the 3 bits from LSSC_E are used by            the memory stage to check for address alignment, due to            which they could not be retimed forward. However, the unused            bit could be moved to the periphery.    -   Class4: Unretimable latches: It is also interesting to examine        the remaining latches which could not be peripherally retimed.        There are two main structures that prevent such retiming—(i)        self-loops (ii) reconvergent paths with different number of        latches. Typical examples of these are shown for the DLX design        in FIG. 12, where FIG. 12(a) shows the self-loop for Dump (for        dumping the fetched instruction in case of a taken branch), and        FIG. 12(b) shows the reconvergent structure for S1/S2_Mux_Select        (used to select the ALU sources).    -    For the DLX design, there were 21 latches that could not be        retimed to the periphery.

IVC2. Final Test Model

Of the 157 original latches, 72 were abstracted out by use of maximalperipheral retiming. Corresponding abstraction over the inputs/outputs(due to removal of immediate data fields and shortening of registeraddresses) resulted in a final test model of 21 latches, 25 primaryinputs and 31 primary outputs. VIS was used to convert the Verilogdescription to an FSM description, which was further used as input toSIS. See R. K. Brayton et al., VIS: A system for verification andsynthesis, Technical Report UCS/ERL M95, Electronics Research Lab,University of California, Berkeley, Calif., December 1995; also toappear in Proceedings of the Conference on Computer-Aided Verification,1996; and E. M. Sentovich, K. I. Singh. C. Moon, H. Savoj, R. K.Brayton, and A. Sangiovanni-Vincentelli, Sequential circuit design usingsynthesis and optimization, Proceedings of the IEEE InternationalConference on Computer Design, 1992. Within SIS, the implicit transitionrelation representation of the final model was obtained in about 10seconds on an 166 MHz UltraSparc workstation with 64 MB main memory.

IVD. MPR on Benchmark Examples

The MPR algorithms according to the present invention were applied tostandard benchmark circuits from the ISCAS89 benchmark suite to estimatetheir effectiveness on ostensibly unstructured logic circuits fromrandom sources. See F. Brglez, D. Bryan, and K. Kozminski, CombinationalProfiles of Sequential Benchmark Circuits, Proceedings of theInternational Symposium on Circuits and Systems, Portland, Oregon, May1989. The MPR algorithm according to the present invention was easilyimplemented in the SIS retiming package by assigning zero cost to anylatch moving to either the primary input or primary output. See E. M.Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A.Sangiovanni-Vincentelli, Sequential circuit design using synthesis andoptimization, Proceedings of the IEEE International Conference onComputer Design, 1992. It was found that when only the same number oflatches is allowed to be removed from the I/O, no reduction in thenumber of latches was observed using MPR, and the subsequent removal oflatches at the periphery. On the other hand, when arbitrary numbers oflatches were allowed to be removed from the I/O, 9 of the 23 ISCAS89circuits for which retiming is computationally viable using the SISretiming package showed a reduction in the number of latches with MPRfollowed by the subsequent removal of latches at the periphery. Theresults are shown in Table 1 shown in FIG. 13. This result is especiallysignificant since these circuits represent completely unstructuredrandom logic. Much better results can be expected on circuits with morepipelining. The experiment was carried out on a 166 MHz UltraSparc with64 MB of main memory and took on the order of a few minutes to complete.The circuits and their latch reductions are given in Table 1.

Other modifications and variations to the invention will be apparent tothose skilled in the art from the foregoing disclosure and teachings.Thus, while only certain embodiments of the invention have beenspecifically described herein, it will be apparent that numerousmodifications may be made thereto without departing from the spirit andscope of the invention.

1. A method for design validation comprising: (a) generating astate-based test model of the design; (b) retiming said test model toproduce a retimed test model; (c) abstracting said retimed test model toproduce an abstracted test model; and (d) applying validation techniqueon the abstracted test model.